Skip to content

Access row/col data via attributes #3045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Access row/col data via attributes #3045

wants to merge 3 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 14, 2013

Open issues:

  • .r and .c (which i like), don't generalize to dim>2. Use a numbered scheme instead? (a0, a1,etc')
  • no panel support. No reason, I just never use them.
  • Setting columns has a caveat, need your numpy-fu help here (see bottom).

Demo:

In [1]: import pandas as pd
   ...: from pandas.util.testing import makeCustomDataframe as mkdf
   ...: 
   ...: pd.options.display.notebook_repr_html=False

In [2]: df=mkdf(4,2,r_idx_nlevels=1)

In [3]: df
Out[3]: 
C0      C_l0_g0 C_l0_g1
R0                     
R_l0_g0    R0C0    R0C1
R_l0_g1    R1C0    R1C1
R_l0_g2    R2C0    R2C1
R_l0_g3    R3C0    R3C1

In [4]: df.r.R_l0_g1 # tab-complete
Out[4]: 
C0
C_l0_g0    R1C0
C_l0_g1    R1C1
Name: R_l0_g1, dtype: object

In [5]: df.r.R_l0_g1 = df.r.R_l0_g0

In [6]: df
Out[6]: 
C0      C_l0_g0 C_l0_g1
R0                     
R_l0_g0    R0C0    R0C1
R_l0_g1    R0C0    R0C1
R_l0_g2    R2C0    R2C1
R_l0_g3    R3C0    R3C1

In [7]: df.c.C_l0_g1 # # tab-complete
Out[7]: 
R0
R_l0_g0    R0C1
R_l0_g1    R0C1
R_l0_g2    R2C1
R_l0_g3    R3C1
Name: C_l0_g1, dtype: object

In [8]: df.c.C_l0_g1 = df.c.C_l0_g0

In [9]: df
Out[9]: 
C0      C_l0_g0 C_l0_g1
R0                     
R_l0_g0    R0C0    R0C0
R_l0_g1    R0C0    R0C0
R_l0_g2    R2C0    R2C0
R_l0_g3    R3C0    R3C0

Multindex example (note recursive syntax):

In [1]: import pandas as pd
   ...: from pandas.util.testing import makeCustomDataframe as mkdf
   ...: 
   ...: pd.options.display.notebook_repr_html=False

In [2]: df=mkdf(4,2,r_idx_nlevels=2,c_idx_nlevels=2)

In [3]: df
Out[3]: 
C0              C_l0_g0 C_l0_g1
C1              C_l1_g0 C_l1_g1
R0      R1                     
R_l0_g0 R_l1_g0    R0C0    R0C1
R_l0_g1 R_l1_g1    R1C0    R1C1
R_l0_g2 R_l1_g2    R2C0    R2C1
R_l0_g3 R_l1_g3    R3C0    R3C1

In [4]: df.r.R_l0_g1.r.R_l1_g1 # tab-complete. twice
Out[4]: 
C0       C1     
C_l0_g0  C_l1_g0    R1C0
C_l0_g1  C_l1_g1    R1C1
Name: R_l1_g1, dtype: object

In [5]: df.c.C_l0_g0.c.C_l1_g0 # tab-complete. twice
Out[5]: 
R0       R1     
R_l0_g0  R_l1_g0    R0C0
R_l0_g1  R_l1_g1    R1C0
R_l0_g2  R_l1_g2    R2C0
R_l0_g3  R_l1_g3    R3C0
Name: C_l1_g0, dtype: object

In [6]: df=mkdf(4,2,r_idx_nlevels=2,c_idx_nlevels=2)

In [7]: df.c.C_l0_g0.c.C_l1_g0=df.c.C_l0_g1.c.C_l1_g1

In [8]: df
Out[8]: 
C0              C_l0_g0 C_l0_g1
C1              C_l1_g0 C_l1_g1
R0      R1                     
R_l0_g0 R_l1_g0    R0C1    R0C1
R_l0_g1 R_l1_g1    R1C1    R1C1
R_l0_g2 R_l1_g2    R2C1    R2C1
R_l0_g3 R_l1_g3    R3C1    R3C1

Suggest a fix for this? I'm sure I'm just lacking in numpy-fu here.

In [7]: df.r.R_l0_g0.r.R_l1_g0 = df.r.R_l0_g1.r.R_l1_g1 # this works
In [8]: df.r.R_l0_g0.r.R_l1_g0 = df.r.R_l0_g1 # this works
In [3]: df.c.C_l0_g0.c.C_l1_g0=df.c.C_l0_g1.c.C_l1_g1 # this works
In [5]: df.c.C_l0_g0.c.C_l1_g0=df.c.C_l0_g1 # this doesn't

@ghost
Copy link
Author

ghost commented Mar 14, 2013

designated 0.12, need to update release.txt beforee merging.

@ghost
Copy link
Author

ghost commented Mar 14, 2013

Add issue to remove df. in future release if this is merged
and removal has support.

@wesm
Copy link
Member

wesm commented Mar 15, 2013

I'm -1 on removing df.foo_col. Mainly because I find it really useful. The perf hit is acceptable

@ghost
Copy link
Author

ghost commented Mar 15, 2013

It wasn't the perf hit (is there any? getattr is a fallback) I was thinking of
but neatness.
I also find the mixing of object properties and data accessors in the same ns
messy, and so prefer seperating things out in a way that also treats cols
and rows on an equal footing.

I'd prefer closing this rather then introducing yet another similar but different
way to do the same thing which will coexist forever.

This wasn't meant as just a feature. it's a cleanup.

@wesm
Copy link
Member

wesm commented Mar 15, 2013

I'm for adding a general way to do what we're describing. The df. is very convenient and harmless enough to stay imho (plus it's familiar compared with record arrays). I'll take a closer look later

@ghost
Copy link
Author

ghost commented Mar 15, 2013

After more thought, df.colX is heavily used in bool indexing and would be bad to give up.
frames with labeled rows that are also identifiers (no spaces, punctuation) occur, but are
not that common, unlike column names.
This doesn't generalize well to other types of index, specificlly DateTimeIndex with Timestamp
labels that that begin with a digit (year).

Withdrawn.

@ghost ghost closed this Mar 15, 2013
@ghost ghost deleted the feature/data_access_via_attrib branch December 20, 2013 15:58
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant